class: title-slide, left, bottom # A Statistical-Modelling Approach to Neural Networks ---- ## **Andrew McInerney**, **Kevin Burke** ### University of Limerick #### Pukyong National University, 27 March 2024 --- # Background -- <img src="data:image/png;base64,#img/crt-logo.jpg" width="60%" style="display: block; margin: auto;" /> -- * Research: Neural networks from a statistical-modelling perspective -- <img src="data:image/png;base64,#img/packages.png" width="70%" style="display: block; margin: auto;" /> --- # Agenda -- - Introduction -- - Model Selection -- - Penalised Selection -- - Stepwise Selection -- - Model Interpretation --- class: inverse1 middle center subsection # Introduction --- # Background -- Neural networks originated from attempts to model the human brain. <br> -- Early influential papers: -- - McCulloch and Pitts (1943) -- - Rosenblatt (1958) -- - Rumelhart, Hinton and Williams (1986) --- # Background Interest within the statistics community in the late 1980s and early 1990s. -- Comprehensive reviews provided by White (1989), Ripley (1993), Cheng and Titterington (1994). -- However, majority of research took place outside the field of statistics (Breiman, 2001; Hooker and Mentch, 2021). --- # Background Renewed interest in merging statistical models and neural networks. -- From a statistical viewpoint: -- - Distributional regression (Rugamer et al., 2020, 2021). -- - Mixed modelling (Tran et al., 2020). -- From a machine-learning viewpoint: -- - Neural Additive Models (Agarwal et al., 2020) --- # Feedforward Neural Networks -- .left-column[ <img src="data:image/png;base64,#img/fnn-1.jpg" width="100%" style="display: block; margin: auto;" /> ] <!-- <img src="img/fnn-1.png" --> <!-- width="800px" height="350px" --> <!-- style="position:absolute; right:500px; top:220px;"> --> <br> -- .right-column[ <img src="data:image/png;base64,#img/nneq1.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/fnn-1.jpg" width="100%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq2.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/fnn-1.jpg" width="100%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq3.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/fnn-1.jpg" width="100%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq4.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- count: false # Feedforward Neural Networks .left-column[ <img src="data:image/png;base64,#img/fnn-1.jpg" width="100%" style="display: block; margin: auto;" /> ] <br> .right-column[ <img src="data:image/png;base64,#img/nneq5.png" width="95%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- # Data Application -- ### Insurance Data (Kaggle) -- 1,338 beneficiaries enrolled in an insurance plan -- Response: `charges` -- 6 Explanatory Variables: `age,` `sex,` `bmi,` `children,` `smoker,` `region` --- # R Implementation -- Many packages available to fit neural networks in R. <br> -- Some popular packages are: -- - `nnet` -- - `neuralnet` -- - `keras` -- - `torch` --- # R Implementation: nnet -- ```r library(nnet) nn <- nnet(charges ~ ., data = insurance, size = 2, maxit = 2000, linout = TRUE) summary(nn) ``` -- ```{.bg-primary} ## a 8-2-1 network with 21 weights ## b->h1 i1->h1 i2->h1 i3->h1 i4->h1 i5->h1 i6->h1 i7->h1 i8->h1 ## 1.39 -0.43 0.08 0.03 -0.08 -3.16 0.07 0.11 0.15 ## b->h2 i1->h2 i2->h2 i3->h2 i4->h2 i5->h2 i6->h2 i7->h2 i8->h2 ## 6.31 0.04 0.13 2.19 -0.11 -6.19 0.15 0.12 0.14 ## b->o h1->o h2->o ## 1.08 -4.82 2.45 ## [...] ``` --- # Motivation -- <img src="data:image/png;base64,#img/insurance_mse_gam-1.png" width="75%" style="display: block; margin: auto;" /> --- # Statistical Perspective -- $$ y_i = \text{NN}(x_i) + \varepsilon_i, $$ -- where $$ \varepsilon_i \sim N(0, \sigma^2) $$ -- $$ \ell(\theta, \sigma^2)= -\frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-\text{NN}(x_i))^2 $$ --- class: inverse1 middle center subsection # Model Selection --- class: inverse1 middle center subsection ## Penalised Selection --- # Smooth Information Criterion -- <img src="data:image/png;base64,#img/sic-publication.png" width="100%" style="display: block; margin: auto;" /> --- # Smooth Information Criterion $$ \text{BIC} = -2\ell(\theta) + \log(n) \left[ \sum_{j=1}^p |\beta_j|^0 + 1 \right] $$ -- where `\begin{equation*} \ell(\theta)= -\frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-x_i^T\beta)^2 \end{equation*}` --- # Smooth Information Criterion $$ \text{BIC} = -2\ell(\theta) + \log(n) \left[ \sum_{j=1}^p |\beta_j|^0 + 1 \right] $$ -- Introduce "smooth BIC": -- $$ \text{SBIC} = -2\ell(\theta) + \log(n) \left[ \sum_{j=1}^p \frac{{\beta_j^2}}{\beta_j^2 + \epsilon^2} + 1 \right] $$ --- # Extending to Neural Networks `$$\mathbb{E}(y) = \text{NN}(X, \theta)$$` -- where `$$\text{NN}(X, \theta) = \phi_o \left[ \gamma_0+\sum_{k=1}^q \gamma_k \phi_h \left( \sum_{j=0}^p \omega_{jk}x_{j}\right) \right]$$` --- # Extending to Neural Networks <p style="font-size: 0.85em"> $$ \text{SBIC} = -2\ell(\theta) + \log(n) \left[ \sum_{jk} \frac{\omega_{jk}^2}{\omega_{jk}^2 + \epsilon^2} + \sum_{k} \frac{\gamma_k^2}{\gamma_k^2 + \epsilon^2} + q + 1 \right] $$ </p> -- where <p style="font-size: 1em"> $$ \ell(\theta) = -\frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-\text{NN}(x_i))^2 $$ </p> --- # Simulation Setup <img src="data:image/png;base64,#img/sim-setup.png" width="50%" style="display: block; margin: auto;" /> --- # Results <img src="data:image/png;base64,#img/sim-single-results.png" width="90%" style="display: block; margin: auto;" /> --- # Extending to Group Sparsity -- .pull-left[ <img src="data:image/png;base64,#img/input-group.png" width="100%" style="display: block; margin: auto;" /> ] -- Single penalty: `\begin{equation*} \frac{\omega_{jk}^2}{\omega_{jk}^2 + \epsilon^2} \end{equation*}` -- Group penalty: $$ \text{card}(\omega_j) \times \frac{||\omega_j||_2^2}{||\omega_j||_2^2 + \epsilon^2} $$ --- class: inputgroup-slide # Group Sparsity ## Input-neuron penalization <p style="font-size: 0.78em"> $$ \text{IN-SBIC} = -2\ell(\theta) + \log(n) \left[ q \times \sum_{j}\frac{||\omega_j||_2^2}{||\omega_j||_2^2 + \epsilon^2} + \sum_{k} \frac{\gamma_k^2}{\gamma_k^2 + \epsilon^2} + q + 1 \right] $$ </p> where `\(\omega_{j} = (\omega_{j1},\omega_{j2},\dotsc,\omega_{jq})^T\)` --- class: hiddengroup-slide # Group Sparsity ## Hidden-neuron penalization <p style="font-size: 0.78em"> $$ \text{HN-SBIC} = -2\ell(\theta) + \log(n) \left[ (p + 1) \times \sum_{k}\frac{||\theta^{(k)}||_2^2}{||\theta^{(k)}||_2^2 + \epsilon^2} + q + 1 \right] $$ </p> where `\(\theta^{(k)} = (\omega_{1k},\omega_{2k},\dotsc,\omega_{pk}, \gamma_k)^T\)` --- # Simulation Setup <img src="data:image/png;base64,#img/sim-setup.png" width="50%" style="display: block; margin: auto;" /> --- # Results (IN-SBIC) <img src="data:image/png;base64,#img/sim-input-results.png" width="90%" style="display: block; margin: auto;" /> --- # Data Application - Results <img src="data:image/png;base64,#img/insurance.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse1 middle center subsection ## Stepwise Selection --- class: selectnn-slide # Model Selection <img src="data:image/png;base64,#img/modelsel.png" width="90%" style="display: block; margin: auto;" /> A Statistically-Based Approach to Feedforward Neural Network Model Selection (arXiv:2207.04248) --- class: selectnn-slide # Insurance: Model Selection ```r library(selectnn) nn <- selectnn(charges ~ ., data = insurance, Q = 8, n_init = 5) summary(nn) ``` -- ```{.bg-primary} ## [...] ## Number of input nodes: 4 ## Number of hidden nodes: 2 ## ## Value: 1218.738 ## Covariate Selected Delta.BIC ## smoker.yes Yes 2474.478 ## bmi Yes 919.500 ## age Yes 689.396 ## children Yes 13.702 ## [...] ``` --- class: inverse1 middle center subsection # Model Interpretation --- # Proposed Solution: interpretnn -- .left-column[ <br> <img src="data:image/png;base64,#img/interpretnn.png" width="80%" style="display: block; margin: auto;" /> ] -- .right-column[ <br> <br> .small[ ```r # install.packages("devtools") library(devtools) install_github("andrew-mcinerney/interpretnn") ``` ] ] --- # Significance Testing -- .pull-left[ <img src="data:image/png;base64,#img/fnn-1.jpg" width="98.9%" style="display: block; margin: auto;" /> ] --- count: false # Significance Testing .pull-left[ <img src="data:image/png;base64,#img/fnn2-1.jpg" width="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ Wald test: {{content}} ] -- $$ `\begin{equation} \omega_j = (\omega_{j1},\omega_{j2},\dotsc,\omega_{jq})^T \end{equation}` $$ {{content}} -- $$ `\begin{equation} H_0: \omega_j = 0 \end{equation}` $$ {{content}} -- $$ `\begin{equation} (\hat{\omega}_{j} - \omega_j)^T\Sigma_{\hat{\omega}_{j}}^{-1}(\hat{\omega}_{j} - \omega_j) \overset{\mathcal{D}}{\longrightarrow} \chi^2_q \end{equation}` $$ {{content}} --- # Insurance: Model Summary ```r intnn <- interpretnn(nn) summary(intnn) ``` -- ```{.bg-primary} ## Coefficients: ## Weights | X^2 Pr(> X^2) ## age (-0.43***, 0.04) | 41.4363 1.01e-09 *** ## sex.male (0.08*, 0.13) | 5.5055 6.38e-02 . ## bmi (0.03, 2.19***) | 105.6106 0.00e+00 *** ## children (-0.08***, -0.11.) | 19.0146 7.43e-05 *** ## smoker.yes (-3.16***, -6.19***) | 250.6393 0.00e+00 *** ## region.northwest (0.07., 0.15) | 2.8437 2.41e-01 ## region.southeast (0.11*, 0.12) | 6.2560 4.38e-02 * ## region.southwest (0.15**, 0.14) | 10.8218 4.47e-03 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # Insurance: Model Summary ```r plotnn(intnn) ``` -- <img src="data:image/png;base64,#img/plotnn-1.png" width="90%" style="display: block; margin: auto;" /> --- # Covariate-Effect Plots -- $$ `\begin{equation} \widehat{\overline{\text{NN}}}_j(x) = \frac{1}{n}\sum_{i=1}^n \text{NN}(x_{i,1}, \ldots,x_{i,j-1},x, x_{i,j+1}, \ldots) \end{equation}` $$ -- Covariate-effect plots of the following form: -- $$ `\begin{equation} \hat{\beta}_j(x,d) =\widehat{\overline{\text{NN}}}_j(x + d) -\widehat{\overline{\text{NN}}}_j(x) \end{equation}` $$ -- Usually set `\(d = \text{SD}(x_j)\)` --- # Insurance: Covariate Effects ```r plot(intnn, conf_int = TRUE, which = c(1, 4)) ``` -- .pull-left[ <img src="data:image/png;base64,#img/PCE_age-1.png" width="90%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="data:image/png;base64,#img/PCE_children-1.png" width="90%" style="display: block; margin: auto;" /> ] --- # Summary -- - Treat neural networks as statistical models -- - Perform penalised and stepwise model selection -- - Use hypothesis tests and covariate-effect plots for interpretation --- class: bigger # References * <font size="5">McInerney, A., & Burke, K. (2022). A statistically-based approach to feedforward neural network model selection. <i>arXiv preprint arXiv:2207.04248</i>. </font> * <font size="5">McInerney, A., & Burke, K. (2023). Interpreting feedforward neural networks as statistical models. <i>arXiv preprint arXiv:2311.08139</i>. </font> * <font size="5">McInerney, A., & Burke, K. (2024). Combining a smooth information criterion with neural networks. <i>To appear on arXiv</i>. </font> ### R Packages ```r devtools::install_github(c("andrew-mcinerney/selectnn", "andrew-mcinerney/interpretnn")) ```
<font size="5.5">andrew-mcinerney</font>
<font size="5.5">@amcinerney_</font>
<font size="5.5">andrew.mcinerney@ul.ie</font>